5 research outputs found
The Size Conundrum: Why Online Knowledge Markets Can Fail at Scale
In this paper, we interpret the community question answering websites on the
StackExchange platform as knowledge markets, and analyze how and why these
markets can fail at scale. A knowledge market framing allows site operators to
reason about market failures, and to design policies to prevent them. Our goal
is to provide insights on large-scale knowledge market failures through an
interpretable model. We explore a set of interpretable economic production
models on a large empirical dataset to analyze the dynamics of content
generation in knowledge markets. Amongst these, the Cobb-Douglas model best
explains empirical data and provides an intuitive explanation for content
generation through concepts of elasticity and diminishing returns. Content
generation depends on user participation and also on how specific types of
content (e.g. answers) depends on other types (e.g. questions). We show that
these factors of content generation have constant elasticity---a percentage
increase in any of the inputs leads to a constant percentage increase in the
output. Furthermore, markets exhibit diminishing returns---the marginal output
decreases as the input is incrementally increased. Knowledge markets also vary
on their returns to scale---the increase in output resulting from a
proportionate increase in all inputs. Importantly, many knowledge markets
exhibit diseconomies of scale---measures of market health (e.g., the percentage
of questions with an accepted answer) decrease as a function of number of
participants. The implications of our work are two-fold: site operators ought
to design incentives as a function of system size (number of participants); the
market lens should shed insight into complex dependencies amongst different
content types and participant actions in general social networks.Comment: The 27th International Conference on World Wide Web (WWW), 201
Towards high quality, scalable education: Techniques in automated assessment and probabilistic user behavior modeling
There are two primary challenges for instructors in offering a high-quality course at large scale. The first is scaling educational experiences to such a large audience. The second major challenge encountered is that of enabling adaptivity of the educational experience. This thesis addresses both major challenges in the way of high-quality scalable education by developing new techniques for large-scale automated assessment (for addressing scalability) and developing new models for interpretable user behavior analysis in educational environments for improving the quality of interaction via personalized education.
Specifically, I perform a study of automated assessment of complex assignments where I explore the effectiveness of different types of features in a feasibility study. I argue for re-framing automated assessment techniques in these more complex contexts as a ranking problem, and provide a systematic approach for integrating expert, peer, and automated assessment techniques via an active-learning-to-rank formulation that outperforms a traditional randomized training solution.
I also present the design and implementation of CLaDS---a Cloud-based Lab for Data Science---to enable students to engage with real-world data science problems at-scale with minimal cost ($7.40/student). I discuss our experience with deploying seven major text data assignments for students in both on-campus and online courses and show that the general infrastructure of CLaDS can be used to efficiently deliver a wide range of hands-on data science assignments.
Understanding student behavior is necessary for improving the quality of scalable education through adaptivity. To this end, I present two general user behavior models for analyzing student interaction log data to understand student behavior. The first focuses on the discovery and analysis of action-based roles in community question answering (CQA) platforms using a generative model called the MDMM behavior model. I show interesting distinctions within CQA communities in question-asking behavior (where two distinct types of askers can be identified) and answering behavior (where two distinct roles surrounding answers emerge). Second, I find that where there are statistically significant differences in health metrics across topical groups on StackExchange, there are also statistically significant differences in behavior compositions, suggesting a relationship between behavior composition and health. Third, I show that the MDMM behavior model can be used to demonstrate similar but distinct evolutionary patterns between topical groups.
The second model focuses on discovering temporal action patterns of learners in Coursera MOOCs. I present a two-layer hidden Markov model (2L-HMM) to extract a multi-resolution summary of user behavior patterns and their evolution, and show that these patterns can be used to extract latent features that correlate with educational outcomes.
Finally, I develop the Piazza Educational Role Mining (PERM) system to close the gap between theory and practice by providing an easy-to-use web-based interface for leveraging probabilistic user behavior models on Piazza CQA interaction data. PERM allows instructors to easily crawl their courses and run subsequent MDMM behavior analyses on them. Analyses provide instructors with insight into the common user behavior patterns (roles) uncovered by plotting their action distributions in a browser. PERM enables instructors to perform deep-dives into an individual role by viewing the concrete sessions that have been assigned a specific role by the model, along with each session's individual actions and associated content. This allows instructors to flexibly combine data-driven statistical inference (through the MDMM behavior model) with a qualitative understanding of the behavior within a role. Finally, PERM develops a model of individual users as mixtures over the discovered roles, which instructors can also deep-dive into to explore exactly what individual users were doing on the platform
A Generative Model for Discovering Action-Based Roles and Community Role Compositions on Community Question Answering Platforms
This paper proposes a generative model for discovering user roles and community role compositions in Community Question Answering (CQA) platforms. While past research shows that participants play different roles in online communities, automatically discovering these roles and providing a summary of user behavior that is readily interpretable remains an important challenge. Furthermore, there has been relatively little insight into the distribution of these roles between communities. Does a community’s composition over user roles vary as a function of topic? How does it relate to the health of the underlying community? Does role composition evolve over time? The generative model proposed in this paper, the mixture of Dirichlet-multinomial mixtures (MDMM) behavior model can (1) automatically discover interpetable user roles (as probability distributions over atomic actions) directly from log data, and (2) uncover community-level role compositions to facilitate such cross-community studies.
A comprehensive experiment on all 161 non-meta communities on the StackExchange CQA platform demonstrates that our model can be useful for a wide variety of behavioral studies, and we highlight three empirical insights. First, we show interesting distinctions in question-asking behavior on StackExchange (where two distinct types of askers can be identified) and answering behavior (where two distinct roles surrounding answers emerge). Second, we find statistically significant differences in behavior compositions across topical groups of communities on StackExchange, and that those groups that have statistically significant differences in health metrics also have statistically significant differences in behavior compositions, suggesting a relationship between behavior composition and health. Finally, we show that the MDMM behavior model can be used to demonstrate similar but distinct evolutionary patterns between topical groups